The Microsoft 2017 Conversational Speech Recognition System

نویسندگان

  • W. Xiong
  • L. Wu
  • F. Alleva
  • Jasha Droppo
  • X. Huang
  • Andreas Stolcke
چکیده

We describe the 2017 version of Microsoft’s conversational speech recognition system, in which we update our 2016 system with recent developments in neural-network-based acoustic and language modeling to further advance the state of the art on the Switchboard speech recognition task. The system adds a CNN-BLSTM acoustic model to the set of model architectures we combined previously, and includes character-based and dialog session aware LSTM language models in rescoring. For system combination we adopt a twostage approach, whereby subsets of acoustic models are first combined at the senone/frame level, followed by a word-level voting via confusion networks. We also added a confusion network rescoring step after system combination. The resulting system yields a 5.1% word error rate on the 2000 Switchboard evaluation set.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-Supervised Model Training for Unbounded Conversational Speech Recognition

For conversational large-vocabulary continuous speech recognition (LVCSR) tasks, up to about two thousand hours of audio is commonly used to train state of the art models. Collection of labeled conversational audio however, is prohibitively expensive, laborious and error-prone. Furthermore, academic corpora like Fisher English (2004) or Switchboard (1992) are inadequate to train models with suf...

متن کامل

The ISL evaluation system for Verbmobil-II

This paper describes the 2000 ISL large vocabulary speech recognition system for fast decoding of conversational speech which was used in the German Verbmobil-II project. The challenge of this task is to build robust acoustic models to handle di erent dialects, spontaneous e ects, and crosstalk as occur in conversational speech. We present speaker incremental normalization and adaptation experi...

متن کامل

and Understanding of Noisy Conversational Speech

As the amount of speech data available increases rapidly, so does the need for efficient search and understanding. Techniques such as Spoken Term Detection (STD), which focuses on finding instances of a particular spoken word or phrase in a corpus, try to address this problem by locating the query word with the desired meaning. However, STD may not provide the desired result, if the Automatic S...

متن کامل

Microsoft Speech Language Translation (MSLT) Corpus: The IWSLT 2016 release for English, French and German

We describe the Microsoft Speech Language Translation (MSLT) corpus, which was created in order to evaluate endto-end conversational speech translation quality. The corpus was created from actual conversations over Skype, and we provide details on the recording setup and the different layers of associated text data. The corpus release includes Test and Dev sets with reference transcripts for sp...

متن کامل

تخمین سریع ضرایب پیچش در هنجارسازی طول مجرای صوتی با استفاده از امتیاز به دست آمده از مدلسازی تشخیص جنسیت

The performance of automatic speech recognition (ASR) systems is adversely affected by the variations in speakers, audio channels and environmental conditions. Making these systems robust to these variations is still a big challenge. One of the main sources of variations in the speakers is the differences between their Vocal Tract Length (VTL). Vocal Tract Length Normalization (VTLN) is an effe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1708.06073  شماره 

صفحات  -

تاریخ انتشار 2017